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Abstract : To reproduce high fidelity audio sound and spatial reverberation characteristics, it is desired to use the long 
filter coefficients in audio surround systems. To implement these long filters, the existing frequency domain based techniques 
such as overlap save method, multi-rate running convolution suffer from more computational complexity. Apart from 
computational complexity, algorithm delay is another important factor that needs to be reduced in the real time 
implementation of these systems. In this paper, mixed single frequency delay line filtering technique was proposed to 
optimize these factors in multichannel audio crosstalk cancellation and showed analytically how the computations are 
reduced for different multichannel cases. The ability of proposed method is that it provides less computational complexity 
even at the impulse response lengths of more than 100msec duration. Unlike in existing methods, algorithm delay depends 
only on processing frame size instead of filter length so that it provides short processing delay. The proposed technique was 
implemented on 32-bit floating point DSP processor and efficient design is provided to achieve processor level optimization 
and less implementation complexity. The computational comparison of this method with conventional methods shows that the 
proposed technique is very efficient for long filters 
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I. Introduction 

The objective of 3D audio system is the ability to reproduce high fidelity audio sound at the desired locations by 
preserving the reverberation characteristics and spatial sound pattern of original audio signal in the reproduced sound. This 
technology has many spatial audio applications such as home theatre entertainment, gaming, teleconference and remote 
control. In 1983, Blauert discovered Head Related Transfer Function (HRTF) Technology, which is the measurement of free 
field sound pressure transformation from a specific location to the ears of the listener. The Headphones use HRTF functions 
to convolve the input signal in order to reproduce spatial audio pattern [1]. Even though they have excellent channel 
separation and equalization, they are little bit cumbersome and inconvenient to use when more number of users are enjoying 
the audio. An alternative to HRTF is the binaural or multichannel based loudspeaker technology that assumes the centered 
listener at some distance from the loudspeakers. The transmission path equalization is obtained by filtering the input signals 
with acoustic inversion matrix (AIF), obtained by inverting the acoustic transfer matrix (ATF) that contains the impulse 
responses of intended and unintended sound paths. To make sure that the unintended sounds are cancelled, the product of 
AIF and ATF should be unit matrix. This approach is generally applicable to two input sources and is called binaural audio 
crosstalk cancellation (CTC). When multi -channels (or) sources are involved, this becomes multi-channel audio CTC, where 
the outputs become the linear combination of filtering with the respective source signal [2] [3] [4] [5] [6]. 

For N audio sources, the outputs are derived as 

y(n)=h (n)*x(n)+h (n)* x in ) + ... + h (n)*x(n)=T l h (n)* x (n) (1) 

L y ' LI v ' 1 v ' L2 v ' 2 w LS W S x ' f = \ Li v / i w 

y (n) = h (n)*x(n)+h (n)* x («) + ...+ h (n)* x (n)=Th (n)* x (n) (2) 

R \ / Rl \ / i v R1 V ' 2 v ' RS v ' S v ' i=l Ri y ' i x ' 



To obtain such transmission path equalization, the impulse responses of AIF matrix may last for several hundreds of 
milliseconds, which leads to the requirement of thousands of FIR filter coefficients [5], This demands more computational 
power to implement these long filters on real-time DSP processors. To overcome the complexity issues, it is essential to 
develop new implementation techniques without compromising for performance. The aim of this paper is to investigate the 
optimized algorithms in order to reduce the computational complexity as well as algorithmic delay. 

The general scenario of real-time filtering implementation is that the frame length, L can be chosen as very less than 
filter length, M. The standard and original approach for the filtering is the time domain convolution, which has the major 
drawback of more computations and because of this, single DSP processor may not sufficient to support the multi -channel 
audio CTC for long filtering. On the other hand, frequency domain approaches based on overlap save and overlap add 
methods that provide less computational complexity. This has the disadvantage of algorithm delay that arises due to the 
addition of zeros in the impulse responses to make the FFT length as a power of 2. 

www.ijmer.com 1088 I Page 



International Journal of Modern Engineering Research (IJMER) 
www.iimer.com Vol.3, Issue.2, March-April. 2013 pp-1088-1096 ISSN: 2249-6645 



xi(n) 



x 2 (n) 




xi(n) 



x 2 (n) 




x s (n) 



H*C = I (unit matrix) 
H = C -1 





h LS (n) 








► 






h RS (n) 



► YL(n) 
YR(n) 



Fig. 1. The binaural audio CTC system model (left) describing the perception of filtered audio signals at the listeners' ears 
(meant for two audio sources). The audio CTC technology with multi-channel sources (right) for real-time implementation 

point of view 

For instance, if the processing frame length is 256 and filter has 1024 coefficients, FFT length becomes 1024 + 256 
-1 = 1279. As FFT requires length to be power of 2, the length should be chosen as 2048, which means 1024 zeros are 
appended to the impulse response and hence the delay of 1024 samples at the output. Hence the sample delay of at least filter 
length will be introduced at the output [7] [8]. 

Martin Vetterli proposed multi-rate running convolution algorithm to reduce the delay but no major computational 
savings can be achieved in this method and also, this technique requires more buffering of data [9] [10]. 

Single frequency Delay Line Filtering (SFDL) efficiently reduces the delay by partitioning the filter length into 
equal sizes, applying overlap save method to each partition and combining the results of all partitions. This is also called 
Uniform partitioned convolution as all partitions are of the same size [1 1][12][13][14]. A brief review of this technique is 
explained in next Section. If this technique is applied individually to each filter of Fig.l, the implementation complexity is 
huge and internal DSP memory may not hold all required buffers, particularly for long filters. To avoid such problems, 
SFDL is combined with mixed filtering [15] in this paper and presented as a new proposed algorithm to reduce 
computational complexity as well as processing delay. With efficient memory management and the properties of FFT, the 
proposed technique is very good choice for audio CTC of long filters. 

This paper is organized as follows. Section 2 provides the review of mixed filtering and Single Frequency Delay 
Line Filtering methods. The combination of these two techniques is explained as proposed method in section 3. It also 
contains theoretical computational complexity, efficient design to avoid more buffer copying routines and better processor 
level optimization techniques. Section 4 details about the experimental details and results. The computational complexity of 
proposed method is compared with that of overlap save method. Finally chapter 5 provides the conclusion and future scope 
to update the proposed method. 

II. Review Of Existing Work 

In this section, the background details of mixed filtering and single Frequency Delay Line filtering are explained. 

II. 1 Mixed Filtering 

The word 'Mixed filtering' means all the filtering operations of CTC can be performed in single equation. To 
understand this, let us form a complex sequence with real-time outputs y L (n) and yn(n). 

s 

y(n) = y L (n)+ jy R (n)=Y J [h Li (n)+ jh Ri (n)]* x^n) (3) 



The frequency domain representation of above equation is obtained as 

Y(k)= I H (k)x (k) 
i=l i i 

k = 0,l,-N-l & N = L + M -1 
where the following assumption made in above equation. 



(4) 
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H. (k) = F[h u (n)+ j h m («)] & h t (n) = h u (n)+ j h m (n) 

The implementation procedure is simple. The FFTs of two consecutive source signals are obtained by applying 
single FFT with decomposition [9] [10]. If the system model has S sources, this step requires 0.25*S*O(N) complex 
multiplications and 0.5*S*(O(N)+2N) complex additions. The 2 nd step is implementation of equation (4) i.e. complex 
frequency multiplication and addition, which requires S*N complex multiplications and (S-1)*N complex additions. Finally 
3 rd step requires single IFFT computations i.e. 0.5*O(N) complex multiplications and O(N) complex additions. The real and 
imaginary components of IFFT output yield yjn) and y R (n) respectively. This complexity is applicable for even number of 
sources. For odd sources, FFT decomposition is not needed as imaginary term will be zero. In Reference [15], mixed 
filtering was explained for two channel sources. Here, this technique is adapted for multichannel source case. 

II.2 Single frequency delay line filtering 

In overlap save method, the FFT size is chosen as N = L+M-l in order to represent the samples of the spectrum 
uniquely at discrete set of frequencies. Also, N must be a power of 2. Due to this, overlap save method provides output 
samples delay of at least filter length, M. If filter length is M = 8192 and the system is operating at 48 kHz sampling 
frequency, this method provides a delay of 170.67msec (8192/48000), which is not acceptable in real time audio 
applications. Also, as FFT length increases, this method suffers from computational complexity. These issues are efficiently 
solved in single frequency delay line filtering. This method relies on partitioning the filter impulse response into equal sizes 
so that the overlap save method can be applied on each partition and finally the outputs of each partition will be summed to 
yield the filter output. This method can be understood with the following explanation [1 1][12][13][14]. 

Let us assume that x (n), h (n) and y (n) be the input, impulse response and outputs of an LTI system respectively. 
The lengths of input frame and filter are L and M. The impulse response is partitioned into number of parts m = M/L so that 
each partition has L samples. 

In z-domain, the impulse response is expressed as 

N-l L-l 2L-1 M-l 

H(z) = ^h{n)z-" =^h(n)z- n + J>( n )z"" +••■•+ J>M Z ~" 

n=0 n=0 n=L n=M-L 

= X h(n)z-" + z- L 2 h(n + L}z- +....+ z- (M - L) X h(n + M^l)^" 

n=0 n=0 n=0 

= Zk W+ z-% W+...+ z- (M - L) h m _M)Y n (5) 



11=0 



Where 



h 0 (n) = h(n), 
h x (n) = h(n + L), 

h m _ 1 (n) = h(n + M -L), 



n = 0,l,.£-l & m = M I L (6) 



These are called partitioned impulse responses of equal length, L. The total partitions are M/L. 
The output in z-domain is expressed as 

Y(z) = H(z)x(z) = Y\h(n)+ z L h(n + L)+ ...+ z <M - L) h{n + M-l)]x(z)z " 

L-l L-l L-l 



= X(z)Zh 0 (n)z-" + z- L X(z)j:h,(n)z-" + ... + z- (M ^X(z)^h m _ l (n)z-" 

n=0 77=0 n=0 

= X (z)H 0 ( z )+ z- L X(zX (z) + . . .+ z-^x{z) H m _ x (z) (7) 

From the above equation, output, Y (z) is expressed as sum of all partitioned outputs. For 1 st frame, the FFT is 
calculated for x (n). When 2 nd fr ame arrives, FFT of x (n) is delayed by L samples and become the input to 2 nd partitioned 
filter. During 3 rd fr ame, FFT of 2 nd frame becomes the input to 2 nd partitioned filter and FFT of 1 5,1 frame becomes the input 
to 3 rd partitioned filter. In this way, only frequency samples of input frames will be delayed and they are summed after 
complex frequency multiplication with the respective FFTs of partitioned filters before acting as input to IFFT. As FFT 
values of input samples are delayed and all delays are equal in size, this technique is single frequency delay line filtering. A 
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block diagram of this method is shown in Fig. 2. As overlap save method is applied to each partitioned filter, the FFT size of 
each filter is derived as L + L -1 = 2L as length of each partitioned filter now becomes L. So, it is required to append zeros of 
length, L, results in output sample delay of L. Hence the delay is reduced from M to L. For our example, the output sample 
delay becomes 5.33msec (256/48000) for a frame length of 256 samples. 

For each frame, one FFT and one IFFT of size 2L are required. The frequency multiplier length is 2L. Such 
frequency multipliers are m = M/L and hence complex multiplications of 2L. M/L = 2M are needed. All frequency multipliers 
have to be added before providing as input to IFFT and hence 2L (m-1) = 2(M - L) complex additions are required. 




Fig.2. Interpretation of SFDL using block diagram representation. A delay of L samples become 2L samples in FFT domain 

due to the FFT length N= L+L-1-2L. 

III. Mixed Single Frequency Delay Line Filtering Algorithm 

The proposed algorithm is the combination of mixed filtering and SFDL applied to multi-channel audio CTC 
structure of Figl. The mathematical model of this is derived as follows. Note that the mathematical model was provided in z- 
domain and the computational complexity is provided in FFT domain for easy understanding and clarity. 

Let us assume that the CTC model has S number of sources with each source is filtered by complex impulse 
response of length, M. The complex impulse response and its frequency response are represented as 



h i (n) = h u {n)+jh Ri (n), 



i = \,2,-S 



and 



M-l 



(8) 



(9) 



Now each impulse response is partitioned into m=M/L parts of equal length, say, L. The partitioned impulse responses are 
represented by 



h iW = KoM\iM-\»Ml * = 1 ' 2 '- 5 

And their frequency responses are given by 

ff wW = Z A u( B )z"". t = L2,...S & 7=0,l,..m-l 

n=0 

The output complex frequency response is obtained by placing equation (1 1) in equation (7) 

r(z)=±H,(z)x,(z) 

1=1 

= t X t (z)H U0 (z) + z- L X,(z) H iA (z)+...+ z^xXzp^ (z) 

i=l 

S M / L -i 



(10) 



(ii) 



(12) 



i=l j=0 



For the case of stereo channel inputs, S=2. For this case, equation (12) becomes 
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M/_! M/_! 

Y(z)= Z[z-^(z)KyW+ Zk^ 2 fe)k,W (13) 

j=0 j=0 

The implementation model can be represented in block diagram as shown in Fig. 3. The SFDL filtering algorithm 
can be applied to each input and the associated complex impulse response. After addition of complex frequency multiplier 
outputs of each partitioned filter, a single IFFT is sufficient to provide the filtered outputs yL(n) and yR(n) in real and 
imaginary parts of output y(n). 




Fig. 3. Block diagram of Mixed SFDL filtering algorithm 



III.l Theoretical Computational Complexity 

The following table provides the details of computations required for proposed algorithm. Note that 0(N)=N.log 2 N. 
Initially the computations are shown for a pair of input sources and the associated complex impulse responses. In Remarks 
column, this is generalized for S number of sources. 



Table 1: Computational complexity of MSFDL algorithm 



To 
calculate 


Comcutational comnlexitv 




Complex 
multiplications 


Complex 
additions 


Remarks 


Xj(k) & 
X i+1 (k) 


0.5 0(2L) 


0(2L) + 
4L 


FFTs of two consecutive sources can be found using single FFT with 
decomposition [7]. When S is even, 

complex multiplications = 0.25*S*O(2L) 
complex additions = 0.5*S*( 0(2L) + 4L). 
When S is odd, decomposition is not required for the last source when 
calculating its FFT as imaginary term of input becomes zero. 

complex multiplications = 0.25*(S-l)*O(2L)+0.5*O(2L) 
complex additions = 0.5*(S-1)*( 0(2L) + 4L) +0(2L). 


X,(k)H,(k) 


2L * M/L = 
2M 


2(M-L) 


This computation is needed for one input source with the associated filters. 
Each Partitioned frequency multiplication requires 2L multiplications. 
Such partitions are M/L and hence 2M complex multiplications are 
needed. All partitioned multiplier outputs are to be added, which requires 
2(M-L) complex additions. 
For S sources, 

complex multiplications = 2*S*M 

cornclex additions = 2*S*(M-L). 


Y(k) 




(S-1)*2L 


This is required due to the addition of individual FFT outputs. Equation 
(12) can be taken as reference for this calculation. 


y(n) 


0.5*O(2L) 


Q(2L) 


This computation is required for IFFT calculation. 
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For even Sources, 



Complex multiplications 
Complex additions 



0.25*S*O (2L) + 2*S*M + 0.5*0 (2L) 
0.5*S*(O (2L) + 4L) + 2*S*(M-L) + (S-l) *2L + O (2L) 



For odd Sources, 



Complex multiplications 
Complex additions 



0.25*(S-1)* O (2L) + O (2L) + 2*S*M 

0.5*(S-l)*(O (2L) + 4L) + 2*0 (2L) + 2*S*(M-L) + (S-l) *2L 



IV. Efficient Memory Management Of Msfdl Filtering Algorithm 



From the algorithm described so far, a lot of memory is required for storing FFT values of filter coefficients 
corresponding to all channels and for frequency delay buffers. If the filter length is 2048 and all the coefficients are stored in 
32-bit floating point format, 32864 bytes of memory is needed for real and imaginary buffers, 16384 bytes for each. Also 
same size of memory is required for frequency delay buffers. This is needed for support of one input channel. Then we can 
imagine how much memory is required to support multi-channel cases. Most of the processors are available with less 
internal RAM memory and more memory with external RAM such as SDRAM, SRAM, etc. Because of less internal RAM, 
it is not possible to store these buffers in internal DSP memory. The major implementation complexity is involved in 
complex frequency multiplication as shown in Fig. 3. When implementing this method on real-time DSP processors, 
memory buffers should be managed efficiently without an increase in computational complexity. From here on, the word 
'complex buffer' is used to represent the combination of real and imaginary buffers in the explanation. 

Any DSP processor has a dedicated memory bus for DMA process. Using DMA, it is possible to copy the external 
memory contents into internal RAM based on processing needs. In this paper, the design was explained based on the 
architecture of SHARC DSP processor. SHARC 214xx series DSP has 5MB of internal memory and 64MB SDRAM. It is 
almost similar for any floating point processor. 

In implementation of complex frequency multiplication for each partition, two complex buffers of size (one buffer 
of delayed frequency buffer and another one is FFT of filter coefficients), 2L are required in the proposed algorithm. Assume 
that the FFT values of all complex filters are stored in external RAM. For each partition, real and imaginary buffers are 
arranged in sequential order i.e. 2L size real buffer followed with 2L imaginary buffer of 1 st partition and the same order of 
buffers for 2 nd partition and so on. In the same way, the frequency delay buffers are also arranged in independent memory 
section of external RAM. 

For processing needs, assume that two dedicated memory sections are reserved for real and imaginary buffers as 
shown in Fig. 4. Initially, the complex buffers of (m-l)th partition and that of 1 st input frame are copied into one set of 
internal dedicated memory sections using DMA process. After this copying is completed, DMA process is enabled for 
copying the complex buffers of (m-2)th partition and 1 st delayed complex input frame buffers into 2 nd set of dedicated 
memory sections. As both memory sections are independent in external RAM, two separate channels are allocated for DMA 
copying process, say channel 0 for complex buffers of filters and channel 1 for delayed complex input frame buffers. During 
DMA process, the DSP starts its complex frequency multiplication on its core. So, both operations are performed in parallel. 
The execution time here is equal to the maximum of DMA process and core process. Once core process is completed, DMA 
also has to be verified for its completion. After DMA is completed, again it is configured to copy the (m-3)th partitioned 
complex frequency buffers and 2 nd delayed complex input frame buffers into first set of dedicated memory sections. When 
DSP starts its core process, DMA copying is going on separate memory bus. After copying is over, DMA is verified for 
completion. In this way, the dedicated memory sections are filled alternatively with external complex buffers. This will be 
continued till all partitions are completed and repeated for all channel contents. 

The high level summary of the implementation steps are provided below. 

During initialization, 

Calculate the FFTs of all partitioned filters with FFT size of 2L each. Repeat the same process for all filter sets 
based on the input source. Store these FFT values in external memory in the sequence shown in Fig. 4. 
For each frame, 

1) Receive the 1 st input frame of size L samples of channel 1 and 2. Calculate their FFTs using complex FFT with 
decomposition. Store these FFT values in the external memory buffers using DMA process 

2) Fill output buffer with zero contents to store the real and imaginary outputs of complex frequency multiplication (CFM). 

3) Copy the complex buffers of filters and that of input frames into internal dedicated memory sections using DMA 
process. Verify for DMA completion. 

4) Now again enable DMA process to copy next partition's complex buffers and V 1 delayed buffer contents into 2 nd 
dedicated internal memory sections. Start the core process to compute CFM of 1 st set dedicated memory buffer contents. 

5) After core process is completed, verify for DMA completion. 

6) Continue steps 3 and 4 till all partitions are completed with the alternative internal memory sections to be used for 
storing complex buffer contents. For each iteration, make sure to add the complex output of CFM with that of output. 

7) Repeat steps 1 to 5 for all input sources 

8) After CFM of all sources is completed, calculate the IFFT for the output and send the real and imaginary buffer contents 
of size L samples as filtered contents. 
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FFT Delay 
Buffers stored in 
External RAM 



© 



FFT Coefficients 
of each partition 
stored in External 
RAM 



Real 0 
Imag 0 



Real 1 



Imag 1 



Dedicated Internal RAM buffers of 2 sets for 
delayed FFT values and FFT coefficients. 

Initially, One set of real and imaginary 
buffers of 2L each are copied from external 
to internal memory. When complex 
multiplication is going on, another set of 
buffers are copied using DMA in parallel to 
core process. 
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(m-1) 
Imag 
(m-D 
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Buffer 1 
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When complex multiplication is going on, 
another set of buffers are copied using DMA 
in parallel to core process. 
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Buffer 2 



Imag 
Buffer 2 



Real 
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Fig.4. Block diagram showing the efficient Memory management for Mixed SFDL filtering on SHARC DSP processors 
using DMA processing. The diagram shows the processing for just initial two frames. The output buffer is initially filled with 
zeros before starting complex frequency multiplication. This is applicable for one input source with the associated complex 

buffers. The same is repeated for S number of sources in general. 



V. Experimental Results And Discussion 

The proposed algorithm "Mixed Single Frequency Delay Line filtering" was implemented on ADSP SHARC 21469 
Ez-Kit Lite board [16] as per the procedure stated in section IV. The frame length was taken as L = 256. The frequency 
multiplication was implemented efficiently with 4 parallel instructions inside the loop and making use of SIMD (Single 
Instruction Multiple Data). For FFT & IFFT, the built-in FFT software modules (available with installation package) were 
used. 

During DMA process, it is required to make sure that the execution times of DMA and core process (i.e. complex 
frequency multiplication) are almost same. DMA execution is dependent on the buffer size to be copied (Here it is 4L in size 
i.e. for real and imaginary buffers) and the clock ratio of DSP and external RAM. The core process was optimized to make 
sure that its execution time should not exceed that of DMA process by making use of multiply add and SIMD (Single 
Instruction multiple data) instructions in SHARC architecture. The acoustic filter lengths from 1024 to 16384 with a step of 
512 were considered as filter lengths. The computational complexity in terms of Mega Peak cycle count was recorded for 
different input channels. The Mega Peak cycle count at various filter lengths for different input channels was given in Table 
2. 




Length of Acouiti : Filter^ M 



Fig. 5. Graph showing the comparison of computational complexity between Mixed overlap save method and MSFDL 
algorithm for different filter lengths with stereo channel inputs. The frame length of the experiment is 256. 
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The same experiment was done with mixed overlap save method for stereo inputs. The comparison of 
computational complexity of this method and that of proposed method was shown in Fig. 5. From the graph, the mixed 
overlap save method obviously needs more computations than that of proposed method. Also, the curve resembles the 
staircase and sudden increase in computational complexity at power of 2 filter lengths and maintained the same complexity 
till next power of 2. This is expected because FFT/IFFT size is derived as N = L+M-l in overlap save method. If N is not a 
power of 2, it will be chosen as next immediate power of 2. 



Table 2: MSFDL algorithm - Mega Peak cycle count at different filter lengths with various input channels. The frame length 

was taken as 256. 



Length of 
Acoustic 
filters, M 


Mega Peak cycle count (rounded to 5 digits 
after decimal point) 




Length 
or 


Mega Peak cycle count (rounded to 5 digits 
after decimal point) 


2ch 
input 


3ch input 


4ch input 


5ch input 




Aroiistir 
filters, M 


2ch input 


3ch input 


4ch input 


5ch input 


1024 


0.04594 


0.06693 


0.08393 


0.10491 




8704 


0.26713 


0.39870 


0.52629 


0.65787 


1536 


0.06069 


0.08905 


0.11342 


0.14178 




9216 


0.28187 


0.42082 


0.55579 


0.69474 


2048 


0.07543 


0.11117 


0.14291 


0.17864 




9728 


0.29662 


0.44294 


0.58528 


0.7316 


2560 


0.09018 


0.13329 


0.17241 


0.21550 




10240 


0.31136 


0.46506 


0.61477 


0.76846 


3072 


0.10492 


0.15540 


0.20189 


0.25237 




10752 


0.32611 


0.48718 


0.64426 


0.80533 


3584 


0.11967 


0.17752 


0.23138 


0.28923 




11264 


0.34085 


0.5093 


0.67375 


0.84219 


4096 


0.13442 


0.19964 


0.26087 


0.32609 




11776 


0.3556 


0.53141 


0.70324 
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Fig. 6. Graph showing the comparison of computational complexity of MSFDL algorithm for different filter lengths with 

various input sources. The frame length of the experiment is 256. 



The variation of computational complexity for different input sources was provided in Fig. 6. The graph shows 
linear relationship between the filter length and the computational complexity with increase in slope as number of sources is 
increased. This is expected because as number of sources is increased, the computations in complex frequency multiplication 
as well as number of FFTs will be increased. 

The reason behind choosing floating point processor for implementation is because of the algorithm implementation 
complexity and computations. In high end applications such as audio surround systems, fixed point processors are not 
encouraged due to the lack of output quality. 
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VI. Conclusion 

An efficient algorithm for the implementation of multichannel audio crosstalk cancellation was presented in this 
paper by combining the techniques of mixed and single frequency delay line filtering techniques as MSFDL algorithm. To 
reduce memory issues on DSP processors during implementation, efficient design was provided by utilizing the resources of 
DSP processor. The results were compared against mixed overlap save method for various filter lengths and these indicate 
that the proposed technique yields better results, particularly, at long filter lengths. Also the variation of computational 
complexity for different input sources were shown clearly. 

The future scope of this work is mixed multiple frequency delay line filtering, in which the impulse response is 
partitioned non-uniformly and all non-uniform filters run in parallel. This requires multicore DSP for implementation point 
of view and suitable for filter lengths more than 16384. The main computations of the proposed algorithm are in complex 
frequency computation block. One can investigate on the methods related to this area to reduce computations. Also it is 
better to investigate on suitable frequency domain methods other than FFT based techniques. Not only algorithm 
optimization is enough but processor level optimization is also very important to achieve good computational complexity. 
For this, suitable DSP processor should be chosen to handle parallel instructions. 
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